Finite-state phrase parsing by rule sequences
نویسندگان
چکیده
We present a novel approach to parsing phrase grammars based on Eric Brill's notion of rule sequences. The basic framework we describe has somewhat less power than a finite-state machine, and yet achieves high accuracy on standard phrase parsing tasks. The rule language is simple, which makes it easy to write rules. Further, this simplicity enables the automatic acquisition of phraseparsing rules through an error-reduction strategy. This paper explores an approach to syntactic analysis that is unconventional in several respects. To begin with, we are concerned not so much with the traditional goal of analyzing the comprehensive structure of complete sentences, as much as with assigning partial structure to parts of sentences. The fragment of interest here is demonstrably a subset of the regular sets, and while these languages are traditionally analyzed with finite-state automata, our approach relies instead on the rule sequence architecture defined by Eric Brill. Why restrict ourselves to the finite-state case? Some linguistic phenomena are easier to model with regular sets than context-free grammars. Proper names are a case in point, since their syntactic distribution partially overlaps that of noun phra~ses in general; as this overlap is only partial, name analysis within a full context-free grammar is cumbersome, and some approaches have taken to include finite-state name parsers as a front-end to a principal context-free parsing stage (Jacobs et al. I99i). Proper names are of further interest, since their identifi cation is independently motivated as valuable to both information retrieval and extraction (Sundheim ~996). Further, several promising recent approaches to information extraction rely on little more than finitestate machines to perform the entire extraction analysis (Appelt et al. I993 , Grishman I995). Why approach this problem with rule sequences? In this paper we maka the case that rule sequences succeed at this task through their simplicity and speed. Most important, they support mixed-mode acquisition: the rules are both easy for an engineer to write and easy to learn automatically.
منابع مشابه
Compiling and Using Finite-State Syntactic Rules
A language-independent framework for syntactic finlte-state parsing is discussed. The article presents a framework, a formalism, a compiler and a parser for g rammars written in this forrealism. As a substantial example, fragments from a nontrivial finite-state grammar of English are discussed. The linguistic framework of the present approach is based on a surface syntactic tagging scheme by F....
متن کاملFinite-state subset approximation of phrase structure
We describe a method and a software tool to approximate and manipulate phrase structure grammars by a string representation of derivation trees and an encoding of a finite automaton that recognizes such strings. Many linguistically natural extensions to phrase structure grammars can be modeled on top of the approximation, allowing for a generic mechanism to model parsing and generation of a var...
متن کاملTowards a Finite-State Parser for Swedish
In this study, we describe a method for parsing part-of-speech tagged unrestricted texts in Swedish using finite-state networks. We use the Xerox Finite-State Tool because of its expressiveness and power for writing and compiling regular expressions and relations. The parser is divided into four modules: i) contiguous phrase structure marker, ii) phrasal head marker, iii) syntactic function tag...
متن کاملFinite-State Parsing of Phrase-Structure Languages and the Status of Readjustment Rules in Grammar
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive o...
متن کاملMicrosoft Word - ICCIIS41-camera-ready
Machine translation (MT) is one of the most attractive fields in natural language processing. In this paper, we propose some new ideas for designing an MT system. For this purpose, we first introduce a grammatical rule induction method. After representing the extracted knowledge by a set of finite automata, a recursive model is proposed, which uses a combination of rule and example based techni...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996